Vliw processor with copy register file

ABSTRACT

A compute program is executed in a VLIW processor, which contains a plurality of functional units and a plurality of register files that are each coupled to a respective subset of the functional units. When a first instruction is executed that results in writing of a result to a register file in a register addressed by a result address from the first instruction, the result is copied to a copy register in a register file. The copy register is selected dependent on the register file to which the result was written, but at least partially independent of the result address, so that results written to different addressed registers in the register file are copied to the same register in the copy file. Subsequently a copy instruction may be executed to copy the result from the copy register file to a second register file, from which the result may be used as operand of another instruction.

The invention relates to a data processing device with instruction words that contain instructions for a plurality of functional units in parallel, such as a Very Large Instruction Word (VLIW) processing device.

VLIW processors contain a plurality of functional units that are capable of executing instructions from a program. The instructions are issued as instruction words that contain instructions for a plurality of functional units in parallel. Operand data is passed between the functional units by means of register files. Each register file contains a set of registers and a number of read and write ports for accessing selected registers. Each functional unit (or group of functional units) is coupled to a different set of ports. Thus, a functional unit is able to read operands produced by other functional units and to write results for use by the other functional units.

In practice a VLIW processor may contain a very large number of functional units. This makes it impracticable to couple all functional units to a single register file. As an alternative architecture it has been proposed to group the functional units into clusters. For each cluster a register file is provided, so that all functional units in a cluster are coupled to ports of this cluster. In this architecture the results produced by a particular functional unit can only be read from the register file by functional units that belong to the same cluster as the particular functional unit. The idea behind this is that instructions from different tasks that require exchange of results are generally executed only by subsets of the functional units, i.e. functional units in a particular cluster. Therefore no connections to register files outside the cluster are needed for those tasks.

Nevertheless, there sometimes remains a need to exchange a limited number of operands and results between functional units in different clusters. Various solutions have been proposed to transport data from one register file to another, so that results produced by functional units in one cluster can be made available to functional units in another cluster.

U.S. Pat. No. 6,269,437 discloses processor with a plurality of register files and a duplicator. The duplicator executes instructions which specify source and target registers in different register files. The duplicator is coupled to read and write ports of the register files. In response to the instructions the duplicator copies data from the source registers to the target registers.

When a program for the processor is compiled the compiler generates a collection of instructions for the various functional units and determines dependency relations between instructions that produce and use certain results respectively. The compiler determines when such a dependency relation exists between instructions that are executed by functional units that do not belong to the same cluster (are not coupled to ports of a common register file). In this case, the compiler generates an instruction for the duplicator to copy the result of the producing instruction to a register in a register file that is coupled to the functional unit that executes the using instruction.

This technique imposes additional scheduling constraints on the generation of instruction words. After execution of the producing instruction, the copy instruction has to be scheduled, followed by the using instruction. The registers involved must remain allocated at least until the relevant instructions have been executed. This reduces the efficiency of the processor.

Among others, it is an object of the invention to provide for increased efficiency of a data processing device with a plurality of functional units that can execute instructions from an instruction word in parallel, using registers distributed over different register files.

The data processing device according to the invention is set forth in claim 1. According to the invention a special copy register file is provided which acts a source of operands for a copy functional unit. Results that are written to registers in register files are copied to the copy register file as part of execution of the instructions that produce the results, i.e. without requiring additional instructions. The copy functional unit is controlled by instructions from the instruction words. The instructions for the copy functional unit indicate which results need to be copied from the copy register file to other register files.

Preferably, wherein the copy register file is coupled to at least part of the ports of the register files via a port coupling link, arranged to copy data written to respective ones of the ports each to a respective register in the copy register file, the respective register being selected dependent on the respective one of the ports but at least partially irrespective of the register address with which the data is supplied to the respective one of the ports. Thus, only a limited number of copy registers is needed in the copy register file per source register file, less than the total number of registers in the source register file. Preferably, the copy register is selected completely independent of the register address.

In principle, each result that is written to a normal register file may automatically be copied to the copy register file. However, this may lead to overwriting of previous results that need to be copied from the copy register file. To limit prevent unneeded copying an embodiment of the data processing apparatus according uses instructions that comprises a field for indicating whether a result of the at least one of the instructions must be copied to the copy register file, the port coupling link being arranged to copy the data dependent on a value in said field. Thus, unnecessary overwriting can be prevented by the program, leaving more time for copy instructions for copying from the copy register file.

A primary application of the invention is copying of results between register files for functional units that do not have ports coupled to the same register file. A further application is reduction of pressure on register use, i.e. temporary saving of data outside a register file, so as to make registers in the register file available for other data. In this case the copy register file is used to save a result that is overwritten in the register file to which it was originally written, the result being written back to that register file later when the result is needed. Writing back may be performed directly from the copy register file, or after storage in or in another register file.

These and other objects and advantageous aspects of the data processing device, method of data processing and method of compiling instruction words will be set forth using the following figures.

FIG. 1 shows a data processing device

FIG. 2 shows a copy register file

FIG. 3 shows a flow chart for generating a program for the processing device

FIG. 1 shows a data processing device with an instruction issue unit 10, functional units 12 a-d, a copy functional unit 14, register files 16 a,b and a copy register file 18. Instruction issue unit 10 has issue slot connections coupled to the functional units 12 a-d and the copy functional unit 14. Instruction issue unit 10 is designed to issue instruction words that each contain a combination of instructions, each for a respective one of the functional units 12 a-d and the copy functional unit 14. For this purpose, instruction issue unit 10 generally contains an instruction memory, a program counter and optional instruction decompression circuitry, but since these are well known and not relevant to the invention they are not shown separately.

A first and second functional unit 12 a,b have operand inputs coupled to read ports 15 a,b of a first register file 16 a. First and second functional unit 12 a,b have result outputs ports coupled to write ports 17 a,b of first register file 16 a. Similarly, a third and fourth functional unit 12 c,d have operand inputs coupled to read ports of a second register file 16 b. Third and fourth functional unit 12 c,d have result outputs coupled to write ports 17 c,d of second register file 16 b. The read and write ports comprise a register addressing part (not shown separately) to address registers in the register files 12 a,b under control of register selection fields in the instructions. The read ports each comprise a register content connection (not shown separately) for feeding contents of an address register to a functional unit 12 a-d. The write ports each comprise a result connection (not shown separately) for feeding a result from a functional unit 12 a-d to the register file 16 a,b.

Functional units 12 a-d may be of any type, such as for example Arithmetic Logic Units (ALU), or memory access units etc. Although only a limited number of functional units has been shown, it will be understood that in practice many more functional units may be provided. Similarly, a greater number of register files may be provided. As shown, each register file 12 a,b defines a cluster of functional units 12 a-d that is connected to the register file. By way of example, each functional unit 12 a-d is coupled to one register file only, but it should be understood that, by means of register file selection hardware some functional units 12 a-d may have their inputs and/or outputs coupled to more than one register file 16 a,b, so that registers in any one of those register files may be selected for reading and/or writing from those functional units under control of instructions. However, preferably each functional unit 12 a-d is coupled to one register file only, since connection to multiple register files increases the number of required ports, instruction width, hardware costs and delay.

Although the invention will be described in terms of functional units 12 a-d, it will be understood that one or more of functional units 12 a-d may be replaced by a group of functional units that share the same read and write ports and execute instructions alternatively.

Copy register file 18 has inputs coupled to each of the write ports 17 a-d of register files 16 a,b. Copy register file furthermore has a read port connected to an operand input of copy functional unit 14. Copy functional unit 14 has result outputs 19 a,b coupled to respective ones of register files 16 a,b.

In operation, instruction issue unit 10 issues instruction words that each contain a combination of instructions, each for a respective one of the functional units 12 a-d and the copy functional unit 14. The instructions for functional units 12 a-d typically contain an operation code, a first and second operand register selection code and a result register selection code. The operation code commands the functional unit to select a specific operation type and the operand register selection codes and result register selection codes are supplied to the ports of the register files to select operand and result registers respectively.

When results are written into register files 16 a,b at least some of the results are automatically copied into registers of copy register file 18. Copy functional unit 14 executes instructions to copy contents of registers in copy register file 18 to addressed registers in register files 16 a,b.

Instructions for copy fuictional unit 14 typically contain an address of an operand register in copy functional unit 18 that contains operand data and a specification of a result register to which the data should be copied. The specification of the result register typically contains a register file selection field and a register selection field, for addressing a selected register file 16 a,b and a register in that register file 16 a,b respectively. In response to the instruction data is copied from the addressed register with operand data to the addressed register in the selected register file 16 a,b (it should be realized that in practice there will be many more than two register files 16 a,b to select from).

Thus, execution of copy instructions issued by instruction issue unit 10 to copy functional unit 14 may be used to make a result of an operation executed by an originating functional unit 12 a-d available for use as operand by a using functional unit 12 a-d that is not coupled to the same register file 16 a,b as the originating functional unit 12 a-d.

In an alternative embodiment, copy functional unit 14 copies to predetermined registers in register files 16 a,b. In this case no result register address is needed in instructions for copy functional unit 14. Also, copy functional unit 14 may broadcast the copies to all register files 16 a,b in parallel. In this case no register file selection field is needed in instructions for copy functional unit 14, but, of course, this may lead to needless register overwriting in many applications where the copy is needed in only one or part of the register files 16 a,b.

FIG. 2 shown an embodiment of copy functional unit 18 (shown in FIG. 1). This embodiment contains a multiplexer 20 and a plurality of registers 22 a-d. The data part of write port 17 a-d of the register files 16 a,b are coupled to inputs 28 a-d of respective ones of registers 22 a-d. Outputs of registers 22 a-d are coupled to an operand input 26 of copy functional unit 14 (not shown) via multiplexer 20. A control input 24 is used for receiving operand addresses from copy instructions for copy functional unit 14.

In operation, when a functional unit 17 a-d writes a result to a write port 17 a-d, the result is automatically also written into the register 22 a-d for that write port 17 a-d. Under control of copy instructions operand data from selected ones of registers 22 a-d is supplied to the operand input 26 of copy functional unit 14.

It will be realized that all data from a particular write port 17 a-b is copied to the same register 22 a-d for that particular write port 17 a-b in copy register file 18, irrespective of the selected register in the register file 16 a,b of the write port 17 a-d. Thus, the number of registers 22 a-d in copy register file 18 is much smaller than the sum of the numbers of registers in register files 16 a-d, so that registers 22 a-d in copy register file 18 can be addressed with a small address field. The price for this is that, without further measures, the content of registers 22 a-d must be copied to the other register files 16 a,b before it is overwritten.

Preferably, the instruction words from instruction issue unit 10 control whether or not result data is copied into registers 22 a-d in copy register file 18. This may be realized for example by augmenting instructions for functional units 12 a-d with copy control information, such as a copy control bit in each particular instruction to indicate whether or not the result of the particular instruction should be copied to the relevant register 22 a-d in copy functional unit 18 when the functional unit 12 a-d writes the result of the particular instruction to its register file 16 a,b. In this case, the copy control bit for the particular instruction is fed to a write enable input (not shown in FIG. 2) of the register 22 a-d for the write port 17 a-d of the functional unit 12 a-d that executes the instruction. Use of the copy control bits makes it possible to delay overwriting of data in registers 22 a-d, so that the instruction for copy functional unit 14 to copy the data form a register 22 a-d may be delayed, for example when data from another register 22 a-d must be copied first.

In an alternative embodiment each register 22 a-d of copy register file 18 for a write port 17 a-d may be replaced by a plurality of registers. In this case, copy instructions for copy functional unit contain selection codes for selecting among the pluralities of registers for the respective write ports 17 a-d. Results from write ports 17 a-d are copied into different ones of this plurality of registers for the write port 17 a-d in round robin fashion. Thus, overwriting of data in registers 22 a-d is delayed even without copy control bits.

Although separate registers 22 a-d have been shown for respective ones of write ports 17 a-d, shared registers (or sets of registers) may be provided for groups of write ports, for example all write ports of a register file 16 a,b. In each instruction cycle data from only one of the group of write ports 17 a-d is written to the register (or one of the registers) for the group of write ports. This reduces the number of connections to registers in copy register file 18. By means of copy control bits, for example, it may be controlled from which of the write ports in the relevant group of write port data is copied.

As an alternative, separate registers 22 a-d may be provided for different groups of registers in the same register file. In this case, a part of the register address which is supplied to the write port 17 a-d is also supplied to the copy register file 18 to select the appropriate register 22 a-d in the copy register file 18. This reduces the average frequency with which the registers 22 a-d in the copy register file 18 are overwritten, giving copy functional unit 16 more time to copy data. By adapting the allocation of registers to different results during a compilation phase so that later needed data is not overwritten in copy register file 18 it can be ensured that this data remains available. The entire register address is not needed for this purpose: only a subset of e.g. one or more of the bits suffices to select a register in copy register file 18 for this purpose.

FIG. 3 shows a flow-chart of a process for generating instruction words for the processing device of FIG. 1. Such a process for generating instructions may be executed by any computer, including the device of FIG. 1. The process results in a set of instruction words stored in instruction issue unit 10 for execution by functional units 12 a-d and copy functional unit 14, possibly after intermediate storage on some medium such as a magnetic or optical disk.

In a first step 31 a specification of a program is received in some form or another, for example in a high level language such as C. In the first step this program is converted into a specification of set of machine operations that have to be executed by functional units 12 a-d to implement the program and a specification of the data dependencies between these operations. In a second step 32, the operations are assigned to fuictional units 12 a-d and scheduled by assignment to different instruction words. In general not all functional units 12 a-d are capable of executing all operations, therefore assignment of operation to functional units 12 a-d is constrained by the capabilities of the functional units 12 a-d. Furthermore, assignment is directed to distribute instructions over different functional units so as to minimize the number of instruction words that need to be executed. In addition second step 32 assigns registers the results of the operations.

In third, fourth and fifth steps 33, 34, 35 the instructions in the instruction words are processed one by one to ensure availability of the operands of the instructions. In a fourth step it is tested whether the functional unit 12 a-d that produces the operand of the instruction is coupled to the same register file as the functional unit 12 a-d that executes the instruction. If so, the operand of the instruction is set to point to the relevant register. If not, fourth step 35 is executed, allocating an intermediate register in the register file 16 a,b of the functional unit 12 a-d that has to execute the instruction. The operand of the instruction is set to point to the intermediate register. The fourth step 35 adds a copy instruction in an instruction word to command copy functional unit 14 to copy the operand from copy register file 18 to the intermediate register. A fifth step 36 sets the copy control bit of the instruction that produces the operand as its result, so that the result is written into copy register file 18. A sixth step 37 tests whether all instructions have been processed. If not, third to fifth steps 33-35 are repeated. If so, a seventh step 37 is executed, assembling the program and storing it in a computer readable medium such as an addressable semi-conductor memory in instruction issue unit 10 or an intermediate medium.

It will be appreciated that FIG. 3 shows merely the steps most directly involved with the invention. In practice many more steps may be added whose implementation is known per se. If necessary, for example, rescheduling steps may occur so as to ensure sufficient time for copy functional unit 14 to copy data, or to ensure free availability of sufficient registers.

Although the invention has been described applied to copying of results between register files for functional units that do not have ports coupled to the same register file, it should be realized that the invention is more generally applicable. For example, copying may be used to reduce pressure on register use. In this case the copy register file is used to save a result that is overwritten in the register file to which it was originally written, the result being written back to that register file later when the result is needed. Writing back may be performed directly from the copy register file, or via memory etc. Thus if a value in a source register file is no longer needed in a particular register file after a copy operation, the register can be reused for other data since a copy can be made to another register file at a later point. 

1. A data processing device comprising a plurality of functional units; a plurality of register files, each with ports coupled to the functional units from a respective cluster of the functional units; an instruction word issue unit for issuing an instruction word to the functional units, the instruction word being capable of comprising a combination of instructions for execution in a common instruction cycle by respective ones of the functional units respectively; a copy register file, coupled to the register files, for receiving a copy of data written into any one of the register files in response to writing of said data into that register file; a copy functional unit coupled to the copy register file, the copy functional unit being arranged for executing an instruction from the instruction word to copy a content of a register from the copy register file to an addressed register in the register files.
 2. A data processing apparatus according to claim 1, wherein the copy register file is coupled to at least part of the ports of the register files each via a respective port coupling link, arranged to copy data written to respective ones of the ports each to a register in a respective set of one or more registers in the copy register file, the respective set being selected dependent on the respective one of the ports, selection of the register in the set, if any, being at least partially irrespective of the register address with which the data is supplied to the respective one of the ports.
 3. A data processing apparatus according to claim 2, wherein at least one of the instructions comprises a field for indicating whether a result of the at least one of the instructions must be copied to the register in the respective set of one or more registers for the port to which the at least one of the instructions writes the result, the port coupling link being arranged to control whether or not data is copied, under control of a value in said field.
 4. A method of compiling a computer program for a processor with a plurality of functional units, and a plurality of register files that each have ports coupled to a respective cluster of functional units according to claim 1, the method comprising generating instructions for implementing a task; assigning each instruction to a respective functional unit; determining whether a first one of the instructions executed by a first one of the functional units requires a result produced by a second one of the functional units that does do not belong to a same one of the clusters as the first one of the functional units; adding a copy instruction for a copy functional unit to copy the result from a copy register file to a first one of the register files which has a read port coupled to the first one of the functional units; storing the program with instruction words containing the instructions and the copy instruction in a computer readable medium, for use in execution by the processor.
 5. A method according to claim 4, comprising updating a second one of the instructions whose execution by the second one of the functional unit results in said result to cause copying of the result to the copy register file when the result is written to a second one of the register files, said updating setting a copy control field in the second one of the instructions which enables copying to the copy register file.
 6. A computer program product comprising instructions for a processing device according to claim 1, the instructions comprising a first instruction for generating a result and writing the result to a first register file with a copy being written to a copy register file as part of execution of the first instruction, a second instruction for copying the result from the copy register file to a second register file, and a third instruction which uses the result from the second register file.
 7. A method of executing a program, the method comprising executing a first instruction with a first functional unit that produces a result and writes that result to a first register file in a first register addressed by a result address from the first instruction, and a copy of the result to a copy register that is selected in a copy register file at least partially independent of the result address; executing a copy instruction to copy the result from the copy register file to a second register file; executing a second instruction with a second functional unit, using the result as operand from the second register file.
 8. A method according to claim 7, wherein the copy register is selected dependent on the port to which the result is written to the first register file.
 9. A method according to claim 7, wherein copy control information from the first instruction is tested to determine whether or not the result is copied to the copy register file. 